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Rise of hundreds cores technologies bring again to the 
first plan the problem of interprocess synchronization in 
database engines. Spinlocks are widely used in contem- 
porary DBMS to synchronize processes at microsecond 
timescale. Latches are Oracle@ RDBMS specific spin- 
locks. The latch contention is common to observe in 
contemporary high concurrency OLTP environments. 

In contrast to system spinlocks used in operating sys- 
tems kernels, latches work in user context. Such user 
level spinlocks are influenced by context preemption and 
multitasking. Until recently there were no direct meth- 
ods to measure effectiveness of user spinlocks. This be- 
came possible with the emergence of Solaris™ 10 Dy- 
namic Tracing framework. DTrace allows tracing and 
profiling both OS and user applications. 

This work investigates the possibilities to diagnose and 
tune Oracle latches, tt explores the contemporary latch 
realization and spinning-blocking strategies, analyses 
corresponding statistic counters. 

A mathematical model developed to estimate analytically 
the effect of tuning _SPIN_COUNT value. 

Keywords: Oracle, Spinlock, Latch, DTrace, Spin 
Time, Spin-Blocking 



1 Introduction 



According to latest Oracle® documentation [1] latch is 
"A simple, low- level serialization mechanism to protect 
shared data structures in the System Global Area" . 

Huge OLTP Oracle RDBMS "dedicated architecture" 
instance contains thousands processes accessed the 
shared memory. This shared memory is called " System 
Global Area" (SGA) and consist of millions cache, meta- 
data and result structures. Simultaneous processes ac- 
cess to these structures synchronized by Locks, Latches 
and KGX Mutexes: 
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Fig.l. Oracle® RDBMS architecture 

Latches and KGX mutexes are the Oracle realizations 
of general spin-blocking spinlock concept. The goal of 
this work is to explore the most commonly used spin- 
lock inside Oracle — latches. Mutexes appeared in lat- 
est Oracle versions inside Library Cache only. Table 1 
compares these synchronization mechanisms. 

Wikipedia defines the spinlock as "a lock where the 
thread simply waits in a loop ("spins ") repeatedly check- 
ing until the lock becomes available. As the thread re- 
mains active but isn't performing a useful task, the use 
of such a lock is a kind of busy waiting". 

Use of spinlocks for multiprocessor synchronization were 
first introduced by Edsger Dijkstra in [2]. Since that 
time, a lot of researches were done in the field of mutual 
exclusion algorithms. Various sophisticated spinlock re- 
alizations were proposed and evaluated. The contem- 
porary review of such algorithms may be found in [3] 

There exist two general spinlock types: 

• System spinlock. Kernel OS threads cannot block. 
Major metrics to optimize system spinlocks are 
atomic operations (or Remote Memory References) 
frequency and shared bus utilization. 

• User application spinlocks like Oracle latch and mu- 
tex. It is more efficient to poll the latch for several 



usee rather than pre-empt the thread doing 1 ms 
context switch. Metrics are latch acquisition CPU 
and elapsed times. 

The latch is a hybrid user level spinlock. The documen- 
tation named subsequent latch acquisition phases as: 

• Atomic Immediate Get. 

• If missed, latch spins by polling location nonatom- 
ically during Spin Get. 

• In spin get not succeed, latch sleeps for Wait Get. 

According to Anderson classification [4] the latch spin 
is one of the simplest spinlocks - TTS ( "test-and-test- 
and-set"). 

Frequently spinlocks use more complex structures than 
TTS. Such algorithms, like famous MCS spinlocks [5] 
were designed and benchmarked to work in the con- 
ditions of 100% latch utilization and may be heavily 
affected by OS preemption. For the current state of 
spinlock theory see [6]. 

If user spinlocks are holding for long, for example due 
to OS preemption, pure spinning becomes ineffective. 
To overcome this problem, after predefined number of 
spin cycles latch waits (blocks) in a queue. Such spin- 
blocking was first introduced in [8] to achieve balance 
between CPU time lost to spinning and context switch 
overhead. Optimal strategies how long to spin before 
blocking were explored in [9, 10, 11] . Robustness of spin- 
blocking in contemporary environments was recently in- 
vestigated in [12] 

Contemporary servers having hundreds CPU cores bring 
to the first plan the problem of spinlock SMP scalabil- 
ity. Spinlock utilization increases almost linearly with 
number of processors [23] . One percent spinlock utiliza- 
tion of Dual Core development computer is negligible 
and may be easily overlooked. However, it may scales 
upto 50% on 96 cores production server and completely 
hang the 256 core machine. This phenomenon is also 
known as " Software lockout" . 

Table 1. Serialization mechanisms in Oracle 





Locks 


Latches 


Mutexes 


Access 


Several 
Modes 


Types and 
Modes 


Operations 


Acquisition 


FIFO 


SIRO (spin) 
+ FIFO 


SIRO 


SMP Atom- 
icity 


No 


Yes 


Yes 


Timescale 


Milli- 
seconds 


Microseconds 


SubMicro- 
seconds 


Life cycle 


Dynamic 


Static 


Dynamic 



1.1 Oracle® RDBMS Performance Tuning 
overview 

During the last 30 years, Oracle developed from the first 
tiny one-user SQL database to the most advanced con- 
temporary RDBMS engine. Each version introduced 
new performance and concurrency advances. The fol- 
lowing timeline is the excerpt from this evolution: 

• v. 2 (1979): the first commercial SQL RDBMS 

• v. 3 (1983): the first database to support SMP 

• v. 4 (1984): read-consistency, Database Buffer Cache 

• v. 5 (1986): Client-Server, Clustering, Distributing 
Database, SGA 

• v. 6 (1988): procedural language (PL/SQL), 
undo/redo, latches 

• v. 7 (1992): Library Cache, Shared SQL, Stored pro- 
cedures, 64bit 

• v. 8/8i (1999): Object types, Java, XML 

• v. 9i (2000): Dynamic SGA, Real Application Clusters 

• v. lOg (2003): Enterprise Grid Computing, Self- 
Tuning, mutexes 

• v. llg (2008): Results Cache, SQL Plan Management, 
Exadata 

• v. 12c (2011): Cloud. Not yet released at the time of 
writing 

As of now, Oracle® is the most complex and widely 
used SQL RDBMS. However, quick search finds more 
then 100 books devoted to Oracle performance tuning 
on Amazon [13, 14, 15]. Dozens conferences covered this 
topic every year. Why Oracle needs such tuning? 

Main reason of this is complex and variable workloads. 
Oracle is working in so different environments ranging 
from huge OLTPs, petabyte OLAPs to hundreds of tiny 
instances running on one server. Every database has its 
unique features, concurrency and scalability issues. 

To provide Oracle RDBMS ability to work in such di- 
verse environments, it has complex internals. Last Ora- 
cle version 11.2 has 344 "Standard" and 2665 "Hidden" 
tunable parameters to adjust and customize its behav- 
ior. Database administrator's education is crucial to 
adjust these parameters correctly. 

Working at Support, I cannot underestimate the impor- 
tance of developer's education. During design phases, 
developers need to make complicated algorithmic, phys- 
ical database and schema design decisions. Design mis- 
takes and "temporary" workarounds may results in mil- 
lion dollars losses in production. Many " Database Inde- 
pendence" tricks also results in performance problems. 

Another flavor of performance problems come from self- 
tuning and SQL plan instabilities, OS and Hardware 



issues. One need take into account also more than 10 
million bug reports on MyOracleSupport. It is crucial 
to diagnose the bug correctly. 

Historically latch contention issues were hard to diag- 
nose and resolve. Support engineers definitely need 
more mainstream science support. This work summa- 
rizes author investigations in this field. 

To allow diagnostics of performance problems Oracle 
instrumented his software well. Every Oracle session 
keeps many statistics counters. These counters describe 
"what sessions have done". There are 628 statistics in 
11.2.0.2. 

Oracle Wait Interface events complements the statistics. 
This instrumentation describes "why Oracle sessions 
have waited". Latest 11.2.0.2 version of Oracle accounts 
1142 distinct wait events. Statistics and Wait Interface 
data used by Oracle® AWR/ASH/ADDM tools, Tun- 
ing Advisors, MyOracleSupport diagnostics and tuning 
tools. More than 2000 internal "dynamic performance" 
X$ tables provide additional data for diagnostics. Ora- 
cle performance data are visualized by Oracle Enterprise 
Manager and other specialized tools. 

This is the traditional framework of Oracle performance 
tuning. However, it was not effective enough in spin- 
locks troubleshooting. 

1.2 The Tool 

To discover how the Oracle latch works, we need the 
tool. Oracle Wait Interface allows us to explore the 
waits only. Oracle X$/V$ tables instrument the latch 
acquisition and give us performance counters. To see 
how latch works through time and to observe short du- 
ration events, we need something like stroboscope in 
physics. Likely, such tool exists in Oracle Solaris™. 
This is DTrace, Solaris 10 Dynamic Tracing framework 
[16]. 

DTrace is event-driven, kernel-based instrumentation 
that can see and measure all OS activity. It allows 
defining the probes (triggers) to trap and write the 
handlers (actions) using dynamically interpreted C-like 
language. No application changes needed to use DTrace. 
This is very similar to triggers in database technologies. 

DTrace provides more than 40000 probes in Solaris ker- 
nel and ability to instrument every user instruction. It 
describes the triggering probe in a four-field format: 
provider:module:function:name. 

A provider is a methodology for instrumenting the sys- 
tem: pid, fbt, syscall, sysinfo, vminfo . . . 

If one need to set trigger inside the oracle process 
with Solaris spid 16444, to fire on entry to function 



kslgetl (get exclusive latch), the probe description will 
be pidl6444:oracle:kslgetl:entry. Predicate and ac- 
tion of probe will filter, aggregate and print out the 
data. All the scripts used in this work are the collec- 
tions of such triggers. 

Unlike standard tracing tools, DTrace works in Solaris 
kernel. When oracle process entered probe function, 
the execution went to Solaris kernel and the DTrace 
filled buffers with the data. The dtrace program 
printed out these buffers. 

Kernel based tracing is more stable and have less over- 
head then userland. DTrace sees all the system activity 
and can take into account the "unaccounted for" user- 
land tracing time associated with kernel calls, schedul- 
ing, etc. 

DTrace allowed this work to investigate how Oracle 
latches perform in real time: 

• Count the latch spins 

• Trace how the latch waits 

• Measure times and distributions 

• Compute additional latch statistics 

The following next sections describe Oracle performance 
tuning and database administrator's related results. 
Reader interested in mathematical estimations may pro- 
ceed directly to section 3 

2 Oracle latch instrumentation 

It was known that the Oracle server uses kslgetl - Ker- 
nel Service Lock Management Get Latch function to ac- 
quire the latch. DTrace reveals other latch interface 
routines: 

• kslgetl(laddr, wait, why, where) - get exclusive 
latch 

• kslg2c(ll,12,trc,why, where) - get two excl. 
child latches 

• kslgetsl(laddr, wait, why, where, mode) 

get shared latch. In Oracle llg 
ksl_get_shared_latch() 

• kslg2cs(ll,12,mode,trc,why, where)) - get two 

shared child latches 

• kslgpl(laddr, comment, why, where) - get par- 
ent and all childs 

• kslfre(laddr) - free the latch 



When Oracle process waits (sleeps) for the latch, it puts 
latch address into ksllawat, "where" and "why" val- 
ues into ksllawer and ksllawhy columns of correspond- 
ing x$ksupr row. This is the fixed table behind the 
v$process view. These columns are extremely useful 
SQL>oradebug call kslgetl <laddress> <wait> <why> <wherewhcn exploring why the processes contend for the latch. 



Fortunately Oracle gave us possibility to do the same 
using oradebug call utility. It is possible to acquire 
the latch manually. This is very useful to simulate latch 
related hangs and contention. 



DTrace scripts also demonstrated the meaning of argu- 
ments: 

• laddres - address of latch in SGA 

• wait - flag for no- wait or wait latch acquisition 

• where - integer code for location from where the 
latch is acquired. 

• why — integer context of why the latch is acquiring 
at this where. 

• mode - requesting state for shared lathes. 8 - 
SHARED mode. 16 - EXCLUSIVE mode 



This is illustrated on Figure 2. 



Process fixed array: 
v$process -> x$ksupr 



List of all latches: 
v$latch ->x$ksllt 




Fig. 2. Latch is holding by process, not session 

In summary, Oracle instruments the latch acquisition in 
x$ksupr fields: 



"Where" and "why" parameters are using for the in- 
strumentation of latch get. 

Integer "where" value is the reason for latch acqui- 
sition. This is the index in an array of "locations" 
strings that literally describes "where". Oracle exter- 
nalizes this array to SQL in x$ksllw fixed table. These 
strings the database administrators are commonly see 
in v$latch_misses and AWR/Statspack reports. 

Fixed view v$latch_misses is based on x$kslwsc fixed 
table. In this table Oracle maintains an array of coun- 
ters for latch misses by "where" location. 

"Why" parameter is named "Context saved from 
call" in dumps. It specifies why the latch is acquired 
at this "where" . 

"Where" and "why" parameters instrument the 
latch get. When the latch will be acquired, Oracle 
saves these values into the latch structure. Oracle 
llg externalizes latch structures in x$kslltr_parent 
and x$kslltr_children fixed tables for parent and 
child latches respectively. Versions lOg and be- 
fore used x$ksllt table. Fixed views v$latch and 
v$latch_children were created on these tables. 

"Where" and "why" parameters for last latch ac- 
quisition may be seen in kslltwhr and kslltwhy 
columns of these tables. Fixed table x$ksuprlat shows 
latches that processes are currently holding. View 
v$latchholder created on it. Again, "where" and 
"why" parameters of latch get present in ksulawhr 
and ksulawhy columns. 



• ksllalaq - address of latch acquiring. Populated 
during immediate get (and spin before llg) 

• ksllawat — latch being waited for. 

• ksllawhy - why for the latch being waited for 

• ksllawere - where for the latch being waited for 

• ksllalow - bit array of levels of currently holding 
latches 

• ksllaspn — latch this process is spinning on. Not 
populated since 8.1 

• ksllaps% — inter-process post statistics 

2.1 The latch structure - ksllt 

Latch structure is named ksllt in Oracle fixed tables. It 
contains the latch location itself, "where" and "why" 
values, latch level, latch number, class, statistics, wait 
list header and other attributes. 

Table 2.1. Latch size by Oracle version 



Version 


Unix 32bit 


Unix 64bit 


Windows 32bit 


7.3.4 


92 




120 


8.0.6 


104 




104 


8.1.7 


104 


144 


104 


9.0.1 


? 


200 


160 


9.2.0 


196 


240 


200 


10.1.0 


? 


256 


208 


10.2.0-11.2.0.2 


100 


160 


104 



Contrary to popular believe Oracle latches were signif- 
icantly evolved through the last decade. Not only ad- 
ditional statistics appeared (and disappeared) and new 
(shared) latch type was introduced, the latch itself was 
changed. Table 2.1 shows how the latch structure size 
changed by Oracle version. The ksllt size decreased in 
10.2 because Oracle made obsolete many latch statistics. 

Oracle latch is not just a single memory location. Before 
Oracle llg the value of first latch byte (word for shared 
latches) was used to determine latch state: 

• 0x00 - latch is free. 

• OxFF - exclusive latch is busy. Was 0x01 in Oracle 
7. 

• 0x01, 0x02, etc. - shared latch holding by 1,2, 
etc. processes simultaneously. 

• 0x20000000 | pid — shared latch holding exclu- 
sively. 

In Oracle llg the first exclusive latch word represents 
the Oracle pid of the latch holder: 

• 0x00 - latch free. 

• 0x12 - Oracle process with pid 18 holds the ex- 
clusive latch. 



2.2 Latch attributes 

According to Oracle Documentation and DTrace traces, 
each latch has at least the following flags and attributes: 

• Name — Latch name as appeared in V$ views 

• SHR — Is the latch Shared! Shared latch is Read- 
Write spinlock. 

• PAR — Is the latch Solitary or Parent for the fam- 
ily of child latches? Both parent and child latches 
share the same latch name. The parent latch can 
be gotten independently, but may act as a master 
latch when acquired in special mode in kslgpl(). 

• G2C — Can two child latches be simultaneously 
requested in wait mode? 

• LNG — Is wait posting used for this latch? Obso- 
lete since Oracle 9.2. 

• UFS Is the latch Ultrafast? It will 
not increment miss statistics when STATIS- 
TICS.LEVEL=BASIC. 10.2 and above 



• Class. 0-7. Spin and wait class assigned to the 
latch. Oracle 9.2 and above. 

Evolution of Oracle latches is summarized in table 2.2. 

Table 2.2. Latch attributes by Oracle version 



Oracle 


Number of 


PAR 


G2C 


LNG 


UFS 


SHR 


version 


latches 












7.3.4.0 


53 


14 


2 


3 






8.0.6.3 


80 


21 


7 


3 




3 


8.1.7.4 


152 


48 


19 


4 




9 


9.2.0.8 


242 


79 


37 






19 


10.2.0.2 


385 


114 


55 




4 


47 


10.2.0.3 


388 


117 


58 




4 


48 


10.2.0.4 


394 


117 


59 




4 


50 


11.1.0.6 


496 


145 


67 




6 


81 


11.1.0.7 


502 


145 


67 




6 


83 


11.2.0.1 


535 


149 


70 




6 


86 



To prevent deadlocks Oracle process can acquire latches 
only with level higher than it currently holding. At the 
same level, the process can request the second G2C 
latch child X in wait mode after obtaining child Y, if 
and only if the child number of X < child number of 
Y. If these rules are broken, the Oracle process raises 
ORA-600 errors. 

"Rising level" rule leads to "trees" of processes waiting 
for and holding the latches. Due to this rule the con- 
tention for higher level latches frequently exacerbates 
contention for lower level latches. These trees can be 
seen by direct SGA access programs. 

Each latch can be assigned to one of 8 classes with dif- 
ferent spin and wait policies. By default, all the latches 
belong to class 0. The only exception is "process allo- 
cation latch", which belongs to class 2. Latch assign- 
ment to classes is controlled by initialization parameter 
_LATCH_CLASSES. Latch class spinning and wait- 
ing policies can be adjusted by 8 parameters named 
_LATCH_CLASS_0 to _LATCH_CLASS_7. 



2.3 Latch Acquisition in Wait Mode 

According to contemporary Oracle 11.2 Documenta- 
tion, latch wait get (kslgetl(laddress,l,. . . )) proceeds 
through the following phases: 

• One fast Immediate get, no spin. 

• Spin get: check the latch upto _SPIN_COUNT 

times. 



Level. 0-14. To prevent deadlocks latches can be 
requested only in increasing level order. 



Sleep on "latch free" wait event with exponential 
backoff. 



• Repeat. 

It occurs that such algorithm was really used ten years 
ago in Oracle versions 7.3-8.1. For example, look at 
Oracle 8i latch get code flow using Dtrace: 



kslgetl (0x200058F8 , 1 , 2 , 3) 
kslges(0x200058F8, ...) 
skgsltst (0x200058F8) 
pollsys( . . . ,timeout=10 ms)- 

skgsltst (0x200058F8) 
pollsys( . . . ,timeout=10 ms)- 

skgsltst (0x200058F8) 
pollsys ( . . . , timeout=10 ms)- 

skgsltst (0x200058F8) ... 
pollsys (... ,timeout=30 ms)- 



■KSL GET exclusive Latch 
■wait get 

call repeated 2000 times 

■ Sleep 1 

call repeated 2000 times 
Sleep 2 
call repeated 2000 times 

■ Sleep 3 

call repeated 2000 times 

■ Sleep 4 . . . 



The 2000 cycles is the value of SPIN.COUNT initial- 
ization parameter. This value could be changed dynam- 
ically without Oracle instance restart. 

Corresponding Oracle event 10046 trace [14] is: 



WAIT #0 
WAIT #0 
WAIT #0 
WAIT #0 



nam=' latch free' ela=l pl=536893688 p2=29 p3=0 

nam=' latch free' ela=l pl=536893688 p2=29 p3=l 

nam=' latch free' ela=l pl=536893688 p2=29 p3=2 

nam=' latch free' ela=3 pl=536893688 p2=29 p3=2 



The sleeps timeouts demonstrate the exponential back- 
off: 

. 01-0 . 01-0 . 01-0 . 03-0 . 03-0 . 07-0 . 07-0 . 15-0 . 23-0 . 39-0 . 39- 
. 71-0 . 71-1 . 35-1 . 35-2 . 0-2 .0-2.0-2.0... sec 

This sequence can be almost perfectly fitted by the fol- 
lowing formula. 



timeout = 2V- N ^+W _ 1 



(1) 



However, such sleep for predefined time was not effi- 
cient. Typical latch holding time is less then 10 mi- 
croseconds. Ten milliseconds sleep was too large. Most 
waits were for nothing, because latch already was free. 
In addition, repeating sleeps resulted in many unneces- 
sary spins, burned CPU and provokes CPU thrashing. 

It was not surprising that in Oracle 9.2-llg exclusive 
latch get was changed significantly. DTrace demon- 
strates its code flow: 



Note the semop() operating system call. This is infinite 
wait until posted. This operating system call will block 
the process until another process posts it during latch 
release. 

Therefore, in Oracle 9.2-11.2, all the latches in default 
class rely on wait posting. Latch is sleeping without 
any timeout. This is more efficient than previous algo- 
rithm. Contemporary latch statistics shows that most 
latch waits is less then 1 ms now. In addition, spinning 
once reduce CPU consumption. 

However, this introduces a problem. If wakeup post 
is lost in OS, waiters will sleep infinitely. This was 
common problem in earlier 2.6.9 Linux kernels. Such 
losses can lead to instance hang because the process 
will never be woken up. Oracle solves this problem by 
_ENABLE_RELIABLE_LATCH_WAITS parame- 
ter. It changes the semopQ system call to semtime- 
dop() call with 0.3 sec timeout. 

Latches assigned to non-default class wait until time- 
out. Number of spins and duration of sleeps for class X 
are determined by corresponding _LATCH_CLASS_X 
parameter, which is a string of: 

"Spin Yield Waittime SleepO Sleepl . . . Sleep7" 

Detailed description of non-default latch classes can be 
found in [21]. 

DTrace demonstrated that by default the process spins 
for exclusive latch for 20000 cycles. This is determined 
by static _LATCH_CLASS_0 initialization parameter. 
The _SPIN_COUNT parameter (by default 2000) is 
effectively static for exclusive latches [21]. Therefore 
spin count for exclusive latches can not be changed with- 
out instance restart. 

Further DTrace investigations showed that shared latch 
spin in Oracle 9.2-llg is governed by _SPIN_COUNT 
value and can be dynamically tuned. Experiments 
demonstrated that X mode shared latch get spins by 
default up to 4000 cycles. S mode does not spin at 
all (or spins in unknown way). Discussion how Oracle 
shared latch works can be found in [21]. The results are 
summarized in table 2.3. 

Table 2.3. Shared latch acquisition 



kslgetl (0x50006318, 1) 
sskgslgf (0x50006318)= 
kslges (0x50006318, ...) 
skgslsgts(. . . ,0x50006318) 
sskgslspin(0x50006318) . . . 
kskthbwt(OxO) 
kslwlmodO 

sskgslgf (0x50006318)= 
skgpwwait 
semop(ll, {17,-1,0}, 1) 



-Immediate latch get 
-Wait latch get 
-Spin latch get 

- repeated 20000 cycles 

- set up Wait List 
-Immediate latch get 
-Sleep latch get 





S mode get 


X mode get 


Held in S mode 


Compatible 


2*_SPIN_COUNT 


Held in X mode 





2*_SPIN_COUNT 


Blocking mode 





2*_SPIN_COUNT 



2.3.1 Latch Release 

Oracle process releases the latch in kslfre(laddr). To 
deal with invalidation storms [4] , the process releases the 



latch nonatomically. Then it sets up memory barrier 
using atomic operation on address individual to each 
process. This requires less bus invalidation and ensures 
propagation of latch release to other local caches. 

This is not fair policy. Latch spinners on the local CPU 
board have the preference. However, this is more effi- 
cient then atomic release. Finally the process posts first 
process in the list of waiters. 



3 The latch contention 
3.1 Raw latch statistic counters 

Latch statistics is the tool to estimate whether the latch 
acquisition works efficiently or we need to tune it. Ora- 
cle counts a broad range of latch related statistics. Table 
3.1 contains description of v$latch statistics columns 
from contemporary Oracle documentation [1]. 

Oracle collects more statistics then are usually con- 
sumed by classic queuing models. 

Table 3.1. Latch statistics 



Statistic: 


Documentation de- 


When and how 

1 L IS L^IldlllitJU • 


GETS 


Number of times the 
latch was requested in 
willing-to-wait mode 


Incremented by 
one after latch 
acquisition 


MISSES 


Number of times the 
latch was requested in 
willing-to-wait mode 
and the requestor had 
to wait 


Incremented by 
one after latch 
acquisition if miss 
occurred 


SLEEPS 


Number of times a 
willing-to-wait latch 
request resulted in a 
session sleeping while 
waiting for the latch 


Incremented by 
number of times 
process slept 
during latch 
acquisition 


SPIN- 
_GETS 


Willing-to-wait latch re- 
quests, which missed 
the first try but suc- 
ceeded while spinning 


Incremented by 
one after latch 
acquisition if miss 
but not sleep 
occured. Counts 
only the first spin 


WAIT- 
_TIME 


Elapsed time spent 
waiting for the latch (in 
microseconds) 


Incremented by 
wait time spent 
during latch 
acquisition. 


IMMED- 

IATE- 

_GETS 


Number of times a latch 
was requested in no- 
wait mode 


Incremented by 
one after each 
no-wait latch get 


IMMED- 

IATE- 

.MISSES 


Number of times a no- 
wait latch request did 
not succeed 


Incremented by 
one after unsuc- 
cessful no-wait 
latch get 



Since version 10.2 many previously collected latch 
statistics have been deprecated. We have lost important 
additional information about latch performance. Here I 
will discuss the remaining statistics set. 

As was demonstrated in previous chapter, since version 
9.2 Oracle uses completely new latch acquisition algo- 
rithm: 

Immediate latch get 
Spin latch get 
Add the process to waiters queue 
Sleep until posted 

GETS, MISSES, etc. are the integral statistics counted 
from the startup of the instance. These values depend 
on complete workload history. AWR and Statspack re- 
ports show changes of integral statistics per snapshot 
interval. Usually these values are "averaged by hour", 
which is much longer then typical latch performance 
spike. 

Another problem with AWR/Statspack report is aver- 
aging over child latches. By default AWR gathers only 
summary data from v$latch. This greatly distorts latch 
efficiency coefficients. The latch statistics should not be 
averaged over child latches. 

To avoid averaging distortions the following analy- 
sis uses the latch statistics from v$latch_parent and 
v$latch_children (or x$ksllt in Oracle version less 
then llg) 

The current workload is characterized by differential 
latch statistics and ratios. 

Table 3.2 Differential (point in time) latch statistics 



Description: 


Definition: 


AWR equivalent: 


Arrival 
rate 


A = 


AGETS 


"Get Requests" 


Atime 


"Snap Time (Elapsed)" 


Gets effi- 


P = 


AMISSES 
AGETS 


"Pet Get Miss" /100 


ciency 








Sleeps ra- 
tio 


K = 


ASLEEPS 
AMISSES 


"Avg Sips /Miss" 


Wait time 
per second 


w = 


AW AIT .TIME 


"Wait Time (s)" 


l() b Atime 


"Snap Time (Elapsed)" 


Spin 

efficiency 


a — 


ASPIN.GETS 


"Spin Gets" 


AMISSES 


"Misses" 



There exist several ways to choose the basic set of dif- 
ferential statistics. I will use the most close to AWR/ 
Statspack way containing "Arrival rate", "Gets effi- 
ciency", "Spin efficiency", "Sleeps ratio" and "Wait 
time per second". Table 3.2 defines these quantities. 

This work analyzes only wait latch gets. The no- wait 
(IMMEDIATE-. . .) gets add some complexity only for 
several latches. I will also assume Atime to be small 
enough that workload do not change significantly. 

Other statistics reported by AWR depend on these key 
statistics: 



. Latch miss rate is AA f{ ssgs = pX. 

Atime ' 

• Latch waits (sleeps) rate is AS £ t ^ S 



npX. 



From the queuing theory point of view, the latch is 
G/G/f/(SIRO+FIFO) system with interesting queue 
discipline including Serve In Random Order spin and 
First In First Out sleep. Using the latch statistics, I 
can roughly estimate queuing characteristics of latch. I 
expect that the accuracy of such estimations is about 
20-30%. 

As a first approximation, I will assume that incoming 
latch requests stream is Poisson and latch holding (ser- 
vice) times are exponentially distributed. Therefore, 
our first latch model will be M/M/l/(SIRO+FIFO). 

Oracle measures more statistics then usually consumed 
by classic queuing models. It is interesting what these 
additional statistics can be used for. 



3.2 Average service time: 

The PASTA (Poisson Arrivals See Time Averages) [20] 
property connects p ratio with the latch utilization. For 
Poisson streams the latch gets efficiency should be equal 
to utilization: 



Amisses 
Agets 



U 



Alatch hold time 
Atime 



(2) 



However, this is not exact for server with finite number 
of processors. The Oracle process occupies the CPU 
while acquiring the latch. As a result, the latch get see 
the utilization induced by other Ncpu ~ 1 processors 
only. Compare this with MVA [17] arrival theorem. In 
some benchmarks there may be only N proc < Ncpu 
Oracle shadow processes that generates the latch load. 
In such case we should substitute N proc instead Ncpu 
in the following estimate: 



1 



min(N C pu, N proc ) 



U 



-U 



(3) 



Here I introduced the the 

min(N C pu, N P roc) 



V 



min(N C pu, N proc ) - 1 



multiplier to correct naive utilization estimation. 
Clearly, the n multiplier confirms that the entire ap- 
proach is inapplicable to single CPU machine. Really 
77 significantly differs from one only during demonstra- 
tions on my Dual-Core notebook. For servers its im- 
pact is below precision of my estimates. For example 
for small 8 CPU server the n multiplier adds only 14% 
correction. 

We can experimentally check the accuracy of these 
formulas and, therefore, Poisson arrivals approxima- 
tion. U can be independently measured by sampling of 
v$latchholder. The latchprofx.sql script by Tancl 
Poder [18] did this at high frequency. Within our accu- 
racy we can expect that p and U should be at least of 
the same order. 

We know that U = XS, where S is average service (latch 
holding) time. This allows us to estimate the latch hold- 
ing time as: 

c _VP 



(4) 



This is interesting. We obtained the first estimation 
of latch holding time directly from statistics. In AWR 
terms this formula looks like 



n- 



"Pct Get Miss'' 



'Snap Time" 



100 



"Get Requests" 



3.3 Wait time: 

Look more closely on the summary wait time per 
second W. Each latch acquisition increments the 
WAIT_TIME statistics by amount of time it waited 
for the latch. According to the Little law, average latch 
sleeping time is related the length of wait (sleep) queue: 

L = X wa u s x (average wait time) = Xpnx 5 (Wait-Time) 



The right hand side of this identity is exactly the "wait 
time per second" statistic. Therefore, actually: 



W = L 



(5) 



We can experimentally confirm this conclusion be- 
cause L can be independently measured by sampling 
of v$process.latchwait column. 



3.4 Recurrent sleeps: 

In ideal situation, the process spins and sleeps only once. 
Consequently, the latch statistics should satisfy the fol- 
lowing identity: 

MISSES = SPIN.GETS + SLEEPS (6) 



Or, equivalently: 



1 = a + k 



(7) 



In reality, some processes had to sleep for the latch sev- 
eral times. This occurred when the sleeping process was 
posted, but another process got the latch before the first 
process received the CPU. The awakened process spins 
and sleeps again. As a results the previously equality 
became invalid. 

Before version 10.2 Oracle directly counted these se- 
quential waits in separate SLEEP1-SLEEP3 statis- 
tics. Since 10.2 these statistics became obsolete. How- 
ever, we can estimate the rate of such "sleep misses" 
from other basic statistics. The recurrent sleep incre- 
ments only the SLEEPS counter. The SPIN.GETS 
statistics not changed. The a + K — 1 is the ratio of inef- 
ficient latch sleeps to misses. The ratio of "unsuccessful 
sleep" to "sleeps" is given by: 



Recurrent sleeps ratio : 



a + K — 1 



(8) 



Normally this ratio should be close to p. Frequent "un- 
successful sleeps" are inefficient and may be a symptom 
of OS waits posting problems or bursty workload. 



3.5 Latch acquisition time: 

Average latch acquisition time is the sum of spin time 
and wait time. Oracle does not directly measure the spin 
time. However, we can measure it on Solaris platform 
using DTrace. 

On other platforms, we should rely on statistics. 
Fortunately in Oracle 9.2-10.2 one can count the 
average number of spinning processes by sampling 
x$ksupr.ksllalaq. The process set this column 
equal to address of acquired latch during active phase 
of latch get. Oracle 8i and before even fill the 
v$process.latchspin during latch spinning. 

Little law allows us to connect average number of spin- 
ning processes with the spinning time: 



Note that according to general queuing theory the 
" Serve In Random Order" discipline of latch spin does 
not affect average latch acquisition time. It is indepen- 
dent on queuing discipline. In steady state, the number 
of processes served during the passage of incoming re- 
quest through the system should be equal to the number 
of spinning and waiting processes. 

In Oracle llg the latch spin is no longer instrumented 
due to a bug. The llg spin is invisible for SQL. This 
do not allow us to estimate N s and related quantities. 



3.6 Comparison of results 

Let me compare the results of DTrace measurements 
and latch statistics. Typical demonstration results for 
our 2 CPU X86 server are: 

/usr/sbin/dtrace -s latch_times .d -p 17242 0x5B7C75F8 

latch gets traced: 165180 

''Library cache latch'', address=5b7c75f 8 

Acquisition time: 

value Distribution count 

4096 I 



8192 
16384 
32768 
65536 
131072 
262144 
524288 
1048576 
2097152 
4194304 
8388608 
16777216 
33554432 



Holding time: 

value 

8192 
16384 
32768 
65536 
131072 
262144 
524288 
1048576 
2097152 
4194304 
8388608 



7324 

151748 

4493 

1676 

988 

464 

225 

211 

53 

21 

1 

1 





Distribution 



count 




105976 

50877 

6962 

1986 

829 

330 

205 

34 

6 





N H = AT, 



(9) 



Average acquisition time =26 us 
Average holding time =37 us 



As a result the average latch acquisition time is: 

T a = \-\N S + W) (10) 



The above histograms show latch acquisition and hold- 
ing time distributions in logarithmic scale. Values are 
in nanoseconds. Compare the above average times with 



the results of latch statistics analysis under the same 
conditions: 



for 0x5B7C75F8 
level#=5 child#=l 
lambda= 
rho= 
eta*rho= 
U= 



20812.2 Hz 
0.078 
0.156 
0.143 



Latch statistics 
' ' library cache ' ' 
Requests rate : 
Miss /get: 
Est. Utilization: 
Sampled Utilization 
Sips /Miss: kappa= 0.013 

Wait_time/sec: W= 0.025 

Sampled queue length L= . 043 
Spin_gets/miss : sigma= 0.987 
Sampled spinnning: Ns= 0.123 
Derived statistics: 
Secondary sleeps ratio = 0.01 
Avg latch holding time = 7.5 us 

sleeping time = 1.2 us 

acquisition time = 7.2 us 

We can see that rjp and W are close to sampled U and 
L respectively. The holding and acquisition times from 
both methods are of the same order. Since both meth- 
ods are intrusive, this is remarkable agreement. Mea- 
surements of latch times and distributions for demo and 
production workloads conclude that: 

The latch holding time for the contemporary servers 
should be normally in microseconds range. 



4 Latch contention in Oracle 9.2- 
Hg 

Latch contention should be suspected if the latch wait 
events are observed in Top 5 Timed Events AWR sec- 
tion. Look for the latches with highest W. Symptoms 
of contention for the latch are highly variable. Most 
commonly observed include: 



To achieve this one need to tunc the SQL opera- 
tors, use bind variables, change the physical schema, 
etc. . . Classic Oracle Performance books explore these 
topics [13, 14, 15]. 

However, this tuning methodology may be too expen- 
sive and even require complete application rewrite. This 
work explores complementary possibility of changing 
_SPIN_COUNT. This commonly treated as old style 
tuning, which should be avoided at any means. Increas- 
ing of spin count may leed to waste of CPU. However, 
nowadays the CPU power is cheap. We may already 
have enough free resources. We need to find conditions 
when the spin count tuning may be beneficial. 

Processes spin for exclusive latch spin upto 20000 cy- 
cles, for shared latch upto 4000 cycles and infinitely for 
mutex. Tuning may find more optimal values for your 
application. 

Oracle does not explicitly forbid spin count tuning. 
However, change of undocumented parameter should be 
discussed with Support. 



4.1 Spin count adjustment 

Spin count tuning depends on latch type. For shared 
latches: 

• Spin count can be adjusted dynamically by 
_SPIN_COUNT parameter. 

• Good starting point is the multiple of default 2000 
value. 

• Setting _SPIN_COUNT parameter in ini- 
tialization file, should be accompanied by 
_LATCH_CLASS_0=" 20000" . Otherwise 
spin for exclusive latches will be greatly affected 
by next instance restart. 



• W > 0.1 sec/sec 

• Utilization > 10% 

• Acquisition (or sleeping) time significantly greater 
then holding time 

V$latch_misses fixed view and latchprofx.sql script 
by Tanel Poder [18] reveal "where" the contention arise. 
One should always take into account that contention for 
a high-level latch frequently exacerbates contention for 
lower- level latches [13]. 

How treat the latch contention? During the last 15 
years, the latch performance tuning was focused on 
application tuning and reducing the latch demand. 



On the other hand if contention is for exclusive latches 
then: 

• Spin count adjustment by _LATCH_CLASS_0 

parameter needs the instance restart. 

• Good starting point is the multiple of default 20000 
value. 

• It may be preferable to increase the number of 
"yields" for class latches. 

In order to tune spin count efficiently the root cause 
of latch contention must be diagnosed. Obviously spin 
count tuning will only be effective if the latch holding 
time S is in its normal microseconds range. At any 



time the number of spinning processes should remain 
less then the number of CPUs. 

It is a common myth that CPU consumption will raise 
infinitely while we increase the spin count. However, ac- 
tually the process will spin up to "residual latch holding 
time". The next chapter will explore this. 



5 Latch spin CPU time 

The spin probes the latch holding time distribution. To 
predict effect of _SPIN_COUNT tuning, let me in- 
troduce the mathematical model. It extends the model 
used in [9] for general latch holding time distribution. As 
a cost function, I will estimate the CPU time consumed 
while spinning. 

Consider a general stream of latch acquisition events. 
Latch was acquired by some process at time T k and 
released at T& + hk, k S Af Here hk is the latch holding 
time distributed with p.d.f. pit). I will assume that 
both T k and hk are generally independent for any k 
and form a recurrent stream. Furthermore, I assume 
here the existence of at least second moments for all the 
distributions. 

If Tfc+i < T k + hk then the latch will be busy when 
the next process tries to acquire it. The latch miss will 
occur. In this case the process will spin for the latch up 
to time A. The spin get will succeed if: 

T k+1 + A > T k + h k 

The process will sleep for the latch if T k +\ + A < T k +h k . 

Therefore, the conditions for latch wait acquisition 
phases are: 

latch miss: T k +\ < T k + h k , 

latch spin get: T k + h k - A < T k+1 < T k + h k , 

latch sleep: T k+ i + A < T k + h k . 

(11) 

If the latch miss occur, then second process will observe 
that latch remain busy for: 



T k +\ — T k + h k — T k+ i 



(12) 



This is "residual time" [20] or time until first event [22] 
of latch release . Its distribution differ from that of h k . 
To reflect this, I will add the subscript I to all residual 
distributions. In addition, I will omit subscript k for 
the stationary state. 

Let me denote the probability that missed process see 
latch release at time less then t as: 

P l (T<t) = P l (t) 



and probability of not releasing the latch during time t 
is Qi(t > t) = 1 — Pi(t < t) . Therefore, the probability 
to spin for the latch during time less then t is 



Psg{ts<t) = \ 



__ / Pi (rk < t) when t < A 
when t > A 



(13) 



and has a discontinuity in t = A because the process ac- 
quiring latch never spins more than A. The magnitude 
of this discontinuity is 1 — P;(A). This is the probability 
of latch sleep. 




Therefore, the spinning probability distribution func- 
tion has a bump in A 

Psg = Pl (t)H(A - 1) + (1 - Pi (A))<f (t - A) (14) 

Here H(x) and 5(x) is Heaviside step and bump func- 
tions correspondingly. Spin efficiency is the probability 
to obtain latch during the spin get : 

A-0 

a= J p. fl (t)dt = fl(A) = l-Q,(A) (15) 



Oracle allows measuring the average number of spinning 
processes. This quantity is proportional to the average 
CPU time spending while spinning for the latch: 



r sg = t Psg {t)dt = / tp,(t)<tt + A(l-fl(A)) (16) 



Integrating by parts both expressions may be rewritten 
in form: 



o- = l-Q,(A) 



A A ( 17) 



r sg = A-/p ; (t) dt = J Qi(t) dt 





or, equivalently: 



cr=l-Q,(A) 

T S g = (tl) - fQt dt 

A 



(18) 



According to classic considerations from the re- 
newal theory [20], the distribution of residual time 
is the transformed latch holding time distribution: 



«(*) = ^ (!- p W) 



The average residual latch holding time is (ti) = 4^4 . 
Incorporating this into previous formulas for spin effi- 
ciency and CPU time results in: 



^jQ(t)dt 

A oo 

= ^JdtfQ(z) dz 



(19) 



These nice formulas encourage us that observables ex- 
plored are not artifacts: 



a = ^-Jdtjp(z) dz 
o t 

A oo oo 

r «s = ^ / d * I dz I p( x ) dx 



(20) 



Assuming existence of second moments for latch holding 
time distribution we can proceed further. It is possible 
to change the integration orders using: 




It is clear that such spin 
probes the latch holding 
time distribution around 
the origin. 

Other parts of latch hold- 
ing time distribution im- 
pact spinning efficiency 

and CPU consumption only through the average holding 
time (t). This allows to estimate how these quantities 
depend upon _SPIN_COUNT (A) change. 

If processes never release latch immediately (p(0) = 0) 
then 

I r S9 = A-|i + o(A 4 ) 



(22) 



For Oracle performance tuning purpose we need to know 
what happens if we double spin count: 

In low efficiency region doubling the spin count will dou- 
ble "spin efficiency" and also double the CPU consump- 
tion. 

These estimations especially useful in the case of se- 
vere latch contention and for the another type of Oracle 
spinlocks — the mutex. 



oo oo 



dz J p{x)dx = J zp(z)dz — t J p(z)dz 



Utilizing this identity twice, we arrive to the following 
expression: 



■mj t 2 p(t)dt + ^J(t-^) P (t)dt 

A 



I will focus on two regions where analytical estimations 
possible. To estimate the effect of spin count tuning, 
we need the approximate scaling rules depending on the 
value of "spin efficiency" a ="Spin gets/Miss". 



5.1 Spin count tuning when spin efficiency 
is low 

The spin may be inefficient (7<1. In this low efficiency 
region, the (20) can be rewritten in form: 



5.2 Spin count tuning when efficiency is 
high 

In high efficiency region, the sleep cuts off the tail of 
latch holding time distribution: 



1 



i /(t-A)p(t) d* 

A 



r = 



_ <* 2 > 



A 



Oracle normally operates in this region of small latch 
sleeps ratio. Here the spin count is greater than number 
of instructions protected by latch A ^> (t). 

From the above it is 
clear that the spin time 
is bounded by both the 
" residual latch holding 
time" and the spin count: 




o= ^-^/(A-t)p(t) dt 

(21) 

r sg = A-^ + ^f(A-tm) dt 
o 



T S9 < min( || , A) 

Sleep prevents process from waste CPU for spinning for 
heavy tail of latch holding time distribution 



Normally latch holding time distribution has exponen- 
tial tail: 

Q(t) ~Cexp(-t/r) 

k = 1 — a ~ Ccxp(— t/r) 

r S9 ~g^-Crcx P (-i/T) 

It is easy to see that if " sleep ratio" is small k = 1 — a <C 
1 then 

Doubling the spin count will square the sleep ratio coef- 
ficient. This will only add part of order k to spin CPU 
consumption. 

I would like to paraphrase this for Oracle performance 
tuning purpose as: 

If "sleep ratio" for exclusive latch is 10% than increase 
of spin count to 40000 may results in 10 times decrease 
of "latch free" wait events, and only 10% increase of 
CPU consumption. 

In other words, if the spin is already efficient, it is worth 
to increase the spin count. This exponential law can be 
compared to Guy Harrison experimental data [24] . 

5.3 Long distribution tails: CPU thrashing 

Frequent origin of long latch holding time distribution 
tails is so-called CPU thrashing. The latch contention 
itself can cause CPU starvation. Processes contending 
for a latch also contend for CPU. Vise versa, lack of 
CPU power caused latch contention. 

Once CPU starves, the operating system runqueue 
length increases and loadaverage exceeds the number 
of CPUs. Some OS may shrink the time quanta un- 
der such conditions. As a result, latch holders may not 
receive enough time to release the latch. 

The latch acquirers preempt latch holders. The 
throughput falls because latch holders not receive CPU 
to complete their work. However, overall CPU con- 
sumption remains high. This seems to be metastable 
state, observed while server workload approaches 100% 
CPU. The KGX mutexes are even more prone to this 
transition. 

Due to OS preemption, residual latch holding time will 
raise to the CPU scheduling scale - upto milliseconds 
and more. Spin count tuning is useless in this case. 
Common advice to prevent CPU thrashing is to tunc 
SQL in order to reduce CPU consumption. Fixed pri- 
ority OS scheduling classes also will be helpful. Future 
works will explore this phenomenon. 

6 Conclusions 

This work investigated the possibilities to diagnose and 
tunc latches, the most commonly used Oracle spinlocks. 



Using DTrace, it explored how the contemporary latch 
works, its spinning-blocking strategies, corresponding 
parameters and statistics. The mathematical model was 
developed to estimate the effect of tuning the spin count. 

The results are important for precise performance tun- 
ing of highly loaded Oracle OLTP databases. 



7 Acknowledgements 

Thanks to Professor S.V. Klimenko for kindly inviting 
me to MEDIAS 2011 conference 

Thanks to RDTEX CEO I.G. Kunitsky for financial 
support. Thanks to RDTEX Technical Support Centre 
Director S.P. Misiura for years of encouragement and 
support of my investigations. 



References 

[1] Oracle®Database Concepts llg Release 2 (11.2). 
2010. 

[2] E. W. Dijkstra. 1965. Solution of a prob- 
lem in concurrent programming control. Com- 
mun. ACM 8, 9 (September 1965), 569-. 
DOI=10. 1145/365559.365617 

[3] J.H. Anderson, Yong-Jik Kim, " Shared- memory 
Mutual Exclusion: Major Research Trends Since 
1986". 2003. 

[4] T. E. Anderson. 1990. The Performance of Spin 
Lock Alternatives for Shared-Memory Multiproces- 
sors. IEEE Trans. Parallel Distrib. Syst. 1, 1 (Jan- 
uary 1990), 6-16. DOI=10. 1109/71.80120 

[5] John M. Mellor-Crummey and Michael L. Scott. 
1991. Algorithms for scalable synchronization 
on shared- memory multiprocessors. ACM Trans. 
Comput. Syst. 9, 1 (February 1991), 21-65. 
DOI=10.1145/103727.103729 

[6] M. Herlihy and N. Shavit. 2008. The Art of Mul- 
tiprocessor Programming. Morgan Kaufmann Pub- 
lishers Inc., San Francisco, CA, USA. ISBN:978- 
0123705914. "Chapter 07 Spin Locks and Con- 
tention." 

[7] T. E. Anderson, D. D. Lazowska, and H. M. Levy. 
1989. The performance implications of thread man- 
agement alternatives for shared-memory multipro- 
cessors. SIGMETRICS Perform. Eval. Rev. 17, 1 
(April 1989), 49-60. DOI=10.1145/75372. 75378 



[8] J. K. Oustcrhout. Scheduling techniques for con- 
current systems. In Proc. Conf. on Dist. Comput- 
ing Systems, 1982. 

[9] Bcng-Hong Lim and Anant Agarwal. 1993. Waiting 
algorithms for synchronization in large-scale multi- 
processors. A CM Trans. Comput. Syst. 11, 3 (Au- 
gust 1993), 253-294. DOI=10. 1145/152864.152869 

[10] Anna R. Karlin, Mark S. Manasse, Lyle A. Mc- 
Geoch, and Susan Owicki. 1990. Competitive ran- 
domized algorithms for non-uniform problems. In 
Proceedings of the first annual ACM-SIAM sympo- 
sium on Discrete algorithms (SODA '90). Society 
for Industrial and Applied Mathematics, Philadel- 
phia, PA, USA, 301-309. 

[11] L. Boguslavsky, K. Harzallah, A. Kreinen, K. 
Sevcik, and A. Vainshtein. 1994. Optimal strate- 
gies for spinning and blocking. J. Parallel Dis- 
trib. Comput. 21, 2 (May 1994), 246-254. 
DOI=10.1006/jpdc.l994.1056 

[12] Ryan Johnson, Manos Athanassoulis, Radu Sto- 
ica, and Anastasia Ailamaki. 2009. A new look 
at the roles of spinning and blocking. In Pro- 
ceedings of the Fifth International Workshop 
on Data Management on New Hardware (Da- 
MoN '09). ACM, New York, NY, USA, 21-26. 
DOI=10. 1145/1565694.1565700 

[13] Steve Adams. 1999. Oracle8i Internal Services for 
Waits, Latches, Locks, and Memory. O'Reilly Me- 
dia. ISBN: 978-1565925984 

[14] Millsap O, Holt J. 2003. Optimizing Oracle per- 
formance. O'Reilly & Associates, ISBN: 978- 
0596005276. 

[15] Richmond Shee, Kirtikumar Deshpande, K. 
Gopalakrishnan. 2004. Oracle Wait Interface: A 
Practical Guide to Performance Diagnostics & 
Tuning. McGraw-Hill Osborne Media. ISBN: 978- 
0072227291 

[16] Bryan M. Cantrill, Michael W. Shapiro, and Adam 
H. Lcvcnthal. 2004. Dynamic instrumentation of 
production systems. In Proceedings of the annual 
conference on USENIX Annual Technical Confer- 
ence (ATEC '04). USENIX Association, Berkeley, 
CA, USA, 2-2. 

[17] Reiser, M. and S. Lavenberg, "Mean Value Analysis 
of Closed Multichain Queueing Networks," J ACM 
27 (1980) pp. 313-322 



[19] Anna R. Karlin, Kai Li, Mark S. Manasse, and 
Susan Owicki. 1991. Empirical studies of com- 
petitve spinning for a shared-memory multiproces- 
sor. SIGOPS Oper. Syst. Rev. 25, 5 (September 
1991), 41-55. DOI=10. 1145/121133.286599 

[20] L. Klcinrock, Queueing Systems, Theory, Volume 
I, ISBN 0471491101. Wiley-Interscience, 1975. 

[21] Andrey Nikolaev blog, Latch, mutex and beyond. 
http : //andreynikolaev . wordpress . com 

[22] F. Zcrnikc. 1929. " Weglangenparadoxon" . Hand- 
buch der Physik 4- Geiger and Scheel eds., Springer, 
Berlin 1929 p. 440. 

[23] B. Sinharoy, et al. 1996. Improving Software MP 
Efficiency for Shared Memory Systems. Proc. of the 
29th Annual Hawaii International Conference on 
System Sciences 

[24] Guy Harrison. 2008. Using _spin_count to reduce 
latch contention in llg. Yet Another Database Blog. 
http : //guyharrison . squarespace . com/ 

About the author 

Andrey Nikolaev is an expert at RDTEX First Line 
Oracle Support Center, Moscow. His contact email is 
Andrey . Nikolaev@rdtex . ru. 



[18] Tanel Poder blog. Core IT for Geeks and Pros 
http : / /blog . tanelpoder . com 



